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Abstract 

A field known as Compressive Sensing (CS) has recently emerged to help address the growing 
challenges of capturing and processing high-dimensional signals and data sets. CS exploits the 
surprising fact that the information contained in a sparse signal can be preserved in a small 
number of compressive (or random) linear measurements of that signal. Strong theoretical 
guarantees have been established on the accuracy to which sparse or near-sparse signals can be 
recovered from noisy compressive measurements. In this paper, we address similar questions in 
the context of a different modeling framework. Instead of sparse models, we focus on the broad 
class of manifold models, which can arise in both parametric and non-parametric signal families. 
Building upon recent results concerning the stable embeddings of manifolds within the measure- 
ment space, we establish both deterministic and probabilistic instance-optimal bounds in £2 for 
manifold-based signal recovery and parameter estimation from noisy compressive measurements. 
In line with analogous results for sparsity-based CS, we conclude that much stronger bounds 
are possible in the probabilistic setting. Our work supports the growing empirical evidence that 
manifold-based models can be used with high accuracy in compressive signal processing. 

Keywords. Manifolds, dimensionality reduction, random projections, Compressive Sensing, spar- 
sity, signal recovery, parameter estimation, Johnson-Lindenstrauss lemma. 
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1 Introduction 

1.1 Concise signal models 

A significant byproduct of the Information Age has been an explosion in the sheer quantity of raw 
data demanded from sensing systems. From digital cameras to mobile devices, scientific computing 
to medical imaging, and remote surveillance to signals intelligence, the size (or dimension) of a 
typical desired signal continues to increase. Naturally, the dimension N imposes a direct burden on 
the various stages of the data processing pipeline, from the data acquisition itself to the subsequent 
transmission, storage, and/or analysis; and despite rapid and continual improvements in computer 
processing power, other bottlenecks do remain, such as communication bandwidth over wireless 
channels, battery power in remote sensors and handheld devices, and the resolution/bandwidth of 
analog-to-digital converters. 



'Division of Engineering, Colorado School of Mines. Email: mwakin@mines.edu. This research was partially 
supported by NSF Grant DMS-0603606 and DARPA Grant HROOll-08-1-0078. The content of this article does not 
necessarily reflect the position or the policy of the Government and no official endorsement should be inferred. 



1 



Fortunately, in many cases, the information contained within a high-dimensional signal actually 
obeys some sort of concise, low-dimensional model. Such a signal may be described as having just 
-fC <C iV degrees of freedom for some K. Periodic signals bandlimited to a certain frequency are one 
example; they live along a fixed if-dimensional linear subspace of M^. Piecewise smooth signals 
are an example of sparse signals, which can be written as a succinct linear combination of just 
K elements from some basis such as a wavelet dictionary. Still other signals may live along K- 
dimensional submanifolds of the ambient signal space M^; examples include collections of signals 
observed from multiple viewpoints in a camera or sensor network. In general, the conciseness of 
these models suggests the possibility for efficient processing and compression of these signals. 

1.2 Compressive measurements 

Recently, the conciseness of certain signal models has led to the use of compressive m,easurem,ents 
for simplifying the data acquisition process. Rather than designing a sensor to measure a signal 
X G R^, for example, it often suffices to design a sensor that can measure a much shorter vector 
y = where <1> is a linear measurement operator represented as an M x A'^ matrix, and where 
typically Af <C N . As we discuss below in the context of Compressive Sensing (CS), when $ is 
properly designed, the requisite number of measurements M typically scales with the information 
level K of the signal, rather than with its ambient dimension N . 

Surprisingly, the requirements on the measurement matrix $ can often be met by choosing $ 
randomly from an acceptable distribution. One distribution allows the entries of $ to be chosen as 
i.i.d. Gaussian random variables; another dictates that $ has orthogonal rows that span a random 
M-dimensional subspace of . 

Physical architectures have been proposed for hardware that will enable the acquisition of signals 
using compressive measurements [11,21,24,28]. The potential benefits for data acquisition are 
numerous. These systems can enable simple, low-cost acquisition of a signal directly in compressed 
form without requiring knowledge of the signal structure in advance. Some of the many possible 
applications include distributed source coding in sensor networks [4] , medical imaging [29] , high-rate 
analog-to-digital conversion [11,24,28], and error control coding [8]. 

1.3 Signal understanding from compressive measurements 

Having acquired a signal x in compressed form (in the form of a measurement vector y), there are 
many questions that may then be asked of the signal. These include: 

Ql. Recovery: What was the original signal x? 

Q2. Sketching: Supposing that x was sparse or nearly so, what were the K basis vectors used to 

generate xl 

Q3. Param,eter estimation: Supposing x was generated from a i^-dimensional parametric model, 
what was the original X-dimensional parameter that generated x? 

Given only the measurements y (possibly corrupted by noise), solving any of the above problems 
requires exploiting the concise, i^T-dimensional structure inherent in the signal.^ GS addresses 
questions Ql and Q2 under the assumption that the signal x is K-sparse (or approximately so) in 
some basis or dictionary; in Section 2 we outline several key theoretical bounds from CS regarding 
the accuracy to which these questions may be answered. 

^ Other problems, such as finding the nearest neighbor to a; in a large database of signals [27], can also be solved 
using compressive measurements and do not require assumptions about the concise structure in x. 
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Figure 1: (a) The articulated signal fe{t) — g{t — d) is defined via shifts of a primitive function g, where g is 
a Gaussian pulse. Each signal is sampled at N points, and as 9 changes, the resulting signals trace out a 1-D 
manifold in R^. (b) Projection of the manifold from onto a random 3-D subspace; the color/shading 
represents different values of 9 ^ [0, !]• 

1.4 Manifold models for signal understanding 

In this paper, we will address these questions in the context of a different modeling framework for 
concise signal structure. Instead of sparse models, we focus on the broad class of manifold models, 
which arise both in settings where a ii'-dimensional parameter 9 controls the generation of the 
signal and also in non-parametric settings. 

As a very simple illustration, consider the articulated signal in Figure 1(a). We let g{t) be a 
fixed continuous-time Gaussian pulse centered at t = and consider a shifted version of g denoted 
as the parametric signal fe{t) ■= g{t — 6) with t,6 G [0,1]- We then suppose the discrete-time 
signal X = X0 £ arises by sampling the continuous-time signal fe{t) uniformly in time, i.e., 
X0{n) = fg(n/N) for n = 1, 2, . . . , A^. As the parameter 9 changes, the signals xg trace out a 
continuous one-dimensional (1-D) curve Ai = {xg : E [0, 1]} C M^. The conciseness of our 
model (in contrast with the potentially high dimension N of the signal space) is reflected in the 
low dimension of the path Ai. 

In the real world, manifold models may arise in a variety of settings. A iC-dimensional parameter 
9 could reflect uncertainty about the 1-D timing of the arrival of a signal (as in Figure 1(a)), the 
2-D orientation and position of an edge in an image, the 2-D translation of an image under study, 
the multiple degrees of freedom in positioning a camera or sensor to measure a scene, the physical 
degrees of freedom in an articulated robotic or sensing system, or combinations of the above. 
Manifolds have also been proposed as approximate models for signal databases such as collections 
of images of human faces or of handwritten digits [5, 26, 35]. 

Consequently, the potential applications of manifold models are numerous in signal processing. 
In some applications, the signal x itself may be the object of interest, and the concise manifold 
model may facilitate the acquisition or compression of that signal. Alternatively, in parametric 
settings one may be interested in using a signal x = xg to infer the parameter 9 that generated 
that signal. In an application known as manifold learning, one may be presented with a collection 
of data {xg^ , xg^ , . . . , xg^^ } sampled from a parametric manifold and wish to discover the underlying 
parameterization that generated that manifold. Multiple manifolds can also be considered simulta- 
neously, for example in problems that require recognizing an object from one of n possible classes, 
where the viewpoint of the object is uncertain during the image capture process. In this case, we 
may wish to know which of n manifolds is closest to the observed image x. 

While any of these questions may be answered with full knowledge of the high-dimensional signal 
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X G M^, there is growing theoretical and experimental support that they can also be answered from 
only compressive measurements y = ^x. In a recent paper, we have shown that given a sufficient 
number M of random measurements, one can ensure with high probability that a manifold M C 
has a stable embedding in the measurement space under the operator such that pairwise 
Euclidean and geodesic distances are approximately preserved on its image ^M. We restate the 
precise result in Section 3, but a key aspect is that the number of requisite measurements M is 
linearly proportional to the information level of the signal, i.e., the dimension K of the manifold. 

As a very simple illustration of this embedding phenomenon. Figure 1(b) presents an experi- 
ment where just M — 3 compressive measurements are acquired from each point xq described in 
Figure 1(a). We let N = 1024 and construct a randomly generated 3 x matrix $ with orthogonal 
rows. Each point xq from the original manifold Ai C M^°^^ maps to a unique point ^xq in M^; the 
manifold embeds in the low-dimensional measurement space. Given any y = for 9' unknown, 
then, it is possible to infer the value 9' using only knowledge of the parametric model for M and 
the measurement operator Moreover, as the number M of compressive measurements increases, 
the manifold embedding becomes much more stable and remains highly self-avoiding. 

Indeed, there is strong empirical evidence that, as a consequence of this phenomenon, questions 
such as Ql (signal recovery) and Q3 (parameter estimation) can be accurately solved using only 
compressive measurements of a signal x, and that these procedures are robust to noise and to 
deviations of the signal x away from the manifold Ai [15, 36]. Additional theoretical and empirical 
justification has followed for the manifold learning [25] and multiclass recognition problems [15] 
described above. Consequently, many of the advantages of compressive measurements that are 
beneficial in sparsity-based CS (low-cost sensor design, reduced transmission requirements, reduced 
storage requirements, lack of need for advance knowledge of signal structure, simplified computation 
in the low-dimensional space , etc.) may also be enjoyed in settings where manifold models 
capture the concise signal structure. Moreover, the use of a manifold model can often capture the 
structure of a signal in many fewer degrees of freedom K than would be required in any sparse 
representation, and thus the measurement rate M can be greatly reduced compared to sparsity- 
based CS approaches. 

In this paper, we will focus on questions Ql (signal recovery) and Q3 (parameter estimation) 
and reinforce the existing empirical work by establishing theoretical bounds on the accuracy to 
which these questions may be answered. We will consider both deterministic and probabilistic 
instance-optimal bounds, and we will see strong similarities to analogous results that have been 
derived for sparsity-based CS. As with sparsity-based CS, we show for manifold-based CS that for 
any fixed uniform deterministic £2 recovery bounds for recovery of all x are necessarily poor. We 
then show that, as with sparsity-based CS, providing for any x a probabilistic bound that holds over 
most $ is possible with the desired accuracy. We consider both noise-free and noisy measurement 
settings and compare our bounds with sparsity-based CS. 

1.5 Paper organization 

We begin in Section 2 with a brief review of CS topics, to set notation and to outline several key 
results for later comparison. In Section 3 we discuss manifold models in more depth, restate our 
previous bound regarding stable embeddings of manifolds, and formalize our criteria for answering 
questions Ql and Q3 in the context of manifold models. In Section 4, we confront the task of 
deriving deterministic instance-optimal bounds in £2- In Section 5, we consider instead probabilistic 
instance-optimal bounds in £2- We conclude in Section 6 with a final discussion. 
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2 Sparsity-Based Compressive Sensing 



2.1 Sparse models 

The concise modeling framework used in Compressive Sensing (CS) is sparsity. Consider a signal 
X G and suppose the N x N matrix = [V^i ^2 • • • i^n] forms an orthonormal basis for M^. 
We say x is K-sparse in the basis ^ if for a G we can write 

X = ^a, 

where ||o!||o = K < N. (The i'o-norm notation counts the number of nonzeros of the entries of a.) 
In a sparse representation, the actual information content of a signal is contained exclusively in the 
K < N positions and values of its nonzero coefficients. 

For those signals that are approximately sparse, we may measure their proximity to sparse 
signals as follows. We define ax G to be the vector containing only the largest K entries of 
a, with the remaining entries set to zero. Similarly, we let xk = '^(Xk- It is then common to 
measure the proximity to sparseness using either \\a — axWi or ||a — a/^ II2 (the latter of which 
equals \\x — xk\\2 because ^ is orthonormal). 

2.2 Compressive measurements 

CS uses the concept of sparsity to simplify the data acquisition process. Rather than designing 
a sensor to measure a signal x G M.'^ , for example, it often suffices to design a sensor that can 
measure a much shorter vector y = ^x, where $ is a linear measurement operator represented as 
an M X N matrix, and typically M <^ N . 

The measurement matrix $ must have certain properties in order to be suitable for CS. One 
desirable property (which leads to the theoretical results we mention in Section 2.3) is known as 
the Restricted Isometry Property (RIP) [7,9, 10]. We say a matrix $ meets the RIP of order K 
with respect to the basis ^ if for some Sk > 0, 

(1 - Sk) ||a||2 < ||^^'a||2 < (1 + ^k) \\a\\2 

holds for all a € with ||a||o < K. Intuitively, the RIP can be viewed as guaranteeing a stable 
embedding of the collection of iC-sparse signals within the measurement space M^. In particular, 
supposing the RIP of order 2K is satisfied with respect to the basis then for all pairs of iC-sparse 
signals xi,X2 G M^, we have 

(1 - S2k) \\xi - X2\\2 < W^Xi - ^X2\\2 < (1 + S2k) \\xi - X2\\2 ■ (1) 

Although deterministic constructions of matrices meeting the RIP are still a work in progress, 
it is known that the RIP often be met by choosing # randomly from an acceptable distribution. 
For example, let ^' be a fixed orthonormal basis for and suppose that 

M >CoKlog{N/K) (2) 

for some constant Cq. Then supposing that the entries of the M x N matrix $ are drawn as 
independent, identically distributed Gaussian random variables with mean and variance jj, it 
follows that with high probability $ meets the RIP of order K with respect to the basis ^. Two 

aspects of this construction deserve special notice: first, the number M of measurements required 
is linearly proportional to the information level K, and second, neither the sparse basis ^ nor the 
locations of the nonzero entries of a need be known when designing the measurement operator <I>. 
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Other random distributions for $ may also be used, all requiring approximately the same number 
of measurements. One of these distributions [2, 14] dictates that $ = ^ N/M H, where H is an 
M X N matrix having orthonormal rows that span a random M-dimensional subspace of M^. We 
refer to such choice of $ as a random orthoprojector? 



2.3 Signal recovery and sketching 

AlthoTigh the sparse structure of a signal x need not be known when collecting measurements 
y = <I>x, a hallmark of CS is the use of the sparse model in order to facilitate understanding from 
the compressive measurements. A variety of algorithms have been proposed to answer Ql (signal 
recovery) , where we seek to solve the apparently undercomplete set of M linear equations y = ^x 
for N unknowns. The canonical method [6, 10, 17] is known as ii -minimization and is formulated 
as follows: first solve 

a = arg min ||q;'||i subject to y = $^'a', (3) 

and then set x = '^a. Under this recovery program, the following bounds are known. 

Theorem 1 [12] Suppose that $ satisfies the RIP of order 2K with respect to ^ and with constant 
S2K < V2 — 1. Let X G M^, suppose y = ^x, and let the recovered estimates a and x be as defined 
above. Then 

\\x — x\\2 = \\a — 3II2 < CiK~^^'^\\a — axWi (4) 
for a constant Ci. In particular, if x is K -sparse, then x = x. 

This result can be extended to account for measurement noise. 

Theorem 2 [12] Suppose that $ satisfies the RIP of order 2K with respect to ^ and with constant 
S2K < \f2 — 1. Let X G M^, and suppose that 



y = $x + r] 



where \\rj\\2 < e. Then let 



a = arg min ||q;'||i subject to \\y — <I>^'q;' L < e, 

and set x = ^a. Then 

\\x — x\\2 = \\a — SII2 < CiK^^^'^\\a — axWi + C2€. (5) 

for constants Ci (which is the same as above) and C2. 

These results are not unique to ii minimization; similar bounds have been established for signal 
recovery using greedy iterative algorithms ROMP [32] and CoSAMP [31]. Bounds of this type are 
extremely encouraging for signal processing. From only M measurements, it is possible to recover 
X with quality that is comparable to its proximity to the nearest iiT-sparse signal, and if x itself is 
iC-sparse and there is no measurement noise, then x can be recovered exactly. Moreover, despite 
the apparent ill-conditioning of the inverse problem, the measurement noise is not dramatically 
amplified in the recovery process. 



^Our previous use of the term "random orthoprojector" in [3] excluded the normalization factor of y/N/M. 
However we find it more appropriate to include this factor in the current paper. 
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These bounds are known as deterministic, instance- optimal bounds because they hold deter- 
ministically for any $ that meets the RIP, and because for a given $ they give a guarantee for 
recovery of any x G based on its proximity to the concise model. 

The use of as a measure for proximity to the concise model (on the right hand side of (4) 
and (5)) arises due to the difficulty in establishing £2 bounds on the right hand side. Indeed, it is 
known that deterministic £2 instance-optimal bounds cannot exist that are comparable to (4) and 
(5). In particular, for any $, to ensure that — x\\2 < C3 — xk\\2 for all x, it is known [13] 
that this requires that M > C4N regardless of K. 

However, it is possible to obtain an instance-optimal £2 bound for sparse signal recovery in the 
noisc-frcc setting by changing from a deterministic formulation to a probabilistic one [13, 16]. In 
particular, by considering any given x G M^, it is possible to show that for most random letting 
the measurements y = ^x, and recovering x via £1 -minimization (3), it holds that 

\\x — x\\2 < C5 \\x — Xk\\2 ■ (6) 

While the proof of this statement [16] does not involve the RIP directly, it holds for many of 
the same random distributions that work for RIP matrices, and it requires the same number of 
measurements (2) up to a constant. 

Similar bounds hold for the closely related problem of Q2 (sketching), where the goal is to 
use the compressive measurement vector y to identify and report only approximately K expansion 
coefficients that best describe the original signal, i.e., a sparse approximation to ax- In the case 
where ^ = /, an efficient randomized measurement process coupled with a customized recovery 
algorithm [23] provides signal sketches that meet a deterministic mixed-norm £2/^1 instance-optimal 
bound analogous to (4) . A desirable aspect of this construction is that the computational complexity 
scales with only log(A^) (and is polynomial in K); this is possible because only approximately K 
pieces of information must be computed to describe the signal. For signals that are sparse in the 
Fourier domain (* consists of the DFT vectors), probabilistic ^2/^2 instance-optimal bounds have 
also been established [22] that are analogous to (6). 

3 Compressive Measurements of Manifold-Modeled Signals 
3.1 Manifold models 

As we have discussed in Section 1.4, there are many possible modeling frameworks for capturing 
concise signal structure. Among these possibilities are the broad class of manifold models. 

Manifold models arise, for example, in settings where the signals of interest vary continuously as 
a function of some i^-dimensional parameter. Suppose, for instance, that there exists some param- 
eter 6 that controls the generation of the signal. We let xg G denote the signal corresponding to 
the parameter 9, and we let denote the X-dimensional parameter space from which 9 is drawn. 
In general, itself may be a K-dimensional manifold and need not be embedded in an ambient 
Euclidean space. For example, supposing 9 describes the 1-D rotation parameter in a top-down 
satellite image, we have Q = S^. 

Under certain conditions on the parameterization 9 ^ xq, it follows that 

M:={xq:9(^ 6} 

forms a i^-dimensional submanifold of M^. An appropriate visualization is that the set Ai forms a 
nonlinear i^-dimensional "surface" within the high-dimensional ambient signal space M^. Depend- 
ing on the circumstances, we may measure the distance between points two points xq^ and xq^ on 
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the manifold M. using either the ambient EucUdean distance 

or the geodesic distance along the manifold, which we denote as d_\4{xg-^,xg^). In the case where 
the geodesic distance along Ai equals the native distance in parameter space, i.e., when 

du {xei ,xe^) = de {Oi , 6*2) , (7) 

we say that M. is isometric to 0. The definition of the distance de(^i)^2) depends on the appro- 
priate metric for the parameter space 0; supposing B is a convex subset of Euclidean space, then 
we have de(^i;^2) = ||^i — ^2||2- 

While our discussion above concentrates on the case of manifolds Ai generated by underlying pa- 
rameterizations, we stress that manifolds have also been proposed as approximate low-dimensional 
models within for nonparametric signal classes such as images of human faces or handwritten 
digits [5,26,35]. These signal families may also be considered. 

The results we present in this paper will make reference to certain characteristic properties of 
the manifold under study. These terms are originally defined in [3, 33] and are repeated here for 
completeness. First, our results will depend on a measure of regularity for the manifold. For this 
purpose, we adopt the condition number defined recently by Niyogi et al. [33] . 

Definition 1 [33] Let M. he a compact Riemannian submanifold of . The condition number 

is defined as 1/r, where r is the largest number having the following property: The open normal 
bundle about M of radius r is embedded in for all r < t. 

The condition number 1/r controls both local properties and global properties of the manifold. 

Its role is summarized in two key relationships [33]. First, the the curvature of any unit-speed 
geodesic path on M. is bounded by 1/r. Second, at long geodesic distances, the condition number 
controls how close the manifold may curve back upon itself. For example, supposing xi,X2 G 7W 
with dM{x\,X2) > r, it must hold that — X2II2 > t/2. 

We also require a notion of "geodesic covering regularity" for a manifold. While this property 
is not the focus of the present paper, we include its definition in Appendix A for completeness. 

We conclude with a brief but concrete example to illustrate specific values for these quantities. 
Let N > 0, K > 0, @ = R mod 27r, and suppose xo G M"^ is given by 

xg = [Kcos{e); Ksm{e); 0; 0; • • • 0]"^. 

In this case, Ai = {xg : 6 G Q} forms a circle of radius k in the a:(l),a;(2) plane. The manifold 
dimension K = 1, the condition number t = k, and the geodesic covering regularity R can be 
chosen as any number larger than i. We also refer in our results to the if-dimensional volume V 
of the M, which in this example corresponds to the circumference 27r/« of the circle. 

3.2 Stable embeddings of manifolds 

In cases where the signal class of interest A4 forms a low-dimensional submanifold of R''^, we have 
theoretical justification that the information necessary to distinguish and recover signals x E M can 
be well-preserved under a sufficient number of compressive measurements y = ^x. In particular, 
we have recently shown that an RIP-like property holds for families of manifold-modeled signals. 
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Theorem 3 [3] Let M be a compact K -dimensional Riemannian submanifold o/R-^ having con- 
dition number 1/t, volume V , and geodesic covering regularity R. Fix < e < 1 and < p < 1. 
Let ^ be a random M x N orthoprojector with 



If M < N, then with probability at least 1 — p the following statement holds: For every pair of 
points xi,X2 & M, 



The proof of this theorem involves the Johnson-Lindenstrauss Lemma [1, 2, 14], which guarantees 
a stable embedding for a finite point cloud under a sufficient number of random projections. In 
essence, manifolds with higher volume or with greater curvature have more complexity and require a 
more dense covering for application of the Johnson-Lindenstrauss Lemma; this leads to an increased 
number of measurements (8). 

By comparing (1) with (9), we see a strong analogy to the RIP of order 2K. This theorem estab- 
lishes that, like the class of iC-sparse signals, a collection of signals described by a iT-dimensional 
manifold A4 C M''^ can have a stable embedding in an M-dimensional measurement space. More- 
over, the requisite number of random measurements M is once again linearly proportional to the 
information level (or number of degrees of freedom) K. 

As was the case with the RIP for sparse signal processing, this result has a number of possible 
implications for manifold-based signal processing. Individual signals obeying a manifold model 
can be acquired and stored efficiently using compressive measurements, and it is unnecessary to 
employ the manifold model itself as part of the compression process. Rather, the model need 
be used only for signal understanding from the compressive measurements. Problems such as Ql 
(signal recovery) and Q3 (parameter estimation) can be addressed. Wc have reported promising 
experimental results with various classes of parametric signals [15,36]. We have also extended 
Theorem 3 to the case of multiple manifolds that are simultaneously embedded [15]; this allows 
both the classification of an observed object to one of several possible models (different manifolds) 
and the estimation of a parameter within that class (position on a manifold). Moreover, collections 
of signals obeying a manifold model (such as multiple images of a scene photographed from different 
perspectives) can be acquired using compressive measurements, and the resulting manifold structure 
will be preserved among the suite of measurement vectors in M^. We have provided empirical and 
theoretical support for the use of manifold learning in the reduced-dimensional space [25] ; this can 
dramatically simplify the computational and storage demands on a system for processing large 
databases of signals. 

3.3 Signal recovery and parameter estimation 

In this paper, we provide theoretical justification for the encouraging experimental results that have 

been observed for problems Ql (signal recovery) and Q3 (parameter estimation). 

To be specific, let us consider a length- iV signal x that, rather than being i^-sparse, we assume 
lives on or near some known i^T-dimensional manifold Ai C . From a collection of measurements 




(8) 



(1 — e) ||xi — X2\\2 < ll^a^i — ^a;2||2 < (1 + e) — X2 



(9) 



y = ^x-\-ri, 
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where $ is a random M x N matrix and r] G M*^ is an additive noise vector, we would like to 
recover either x or a parameter 6 that generates x. 

For the signal recovery problem, we will consider the following as a method for estimating x: 

X = arg min \\y — $a:;'|L , (10) 

supposing here and elsewhere that the minimum is uniquely defined. We also let x* be the optimal 
"nearest neighbor" to x on i.e., 

X* = arg min llx — x'lL . (11) 
x'eM" "2 

To consider signal recovery successful, we would like to guarantee that \\x — x\\2 is not much larger 
than ||x — x*\\2- 

For the parameter estimation problem, where we presume x k, xg for some 6 & Q, we propose 
a similar method for estimating 9 from the compressive measurements: 

9 = arg min \\y — $xg/ 

Let 9* be the "optimal estimate" that could be obtained using the fuh data x G M^^, i.e., 

9* = arg min llx — xeAl 

(If X = X0 exactly for some 9, then 9* = 9; otherwise this formulation allows us to consider signals 
X that are not precisely on the manifold M in M^. This generalization has practical relevance; 
a local image block, for example, may only approximately resemble a straight edge, which has a 
simple parameterization.) To consider parameter estimation successful, we would like to guarantee 
that de{9,9*) is smah. 

As we will see, bounds pertaining to accurate signal recovery can often be extended to imply 
accurate parameter estimation as well. However, the relationships between distance de in parameter 
space and distances d_M and || • ||2 in the signal space can vary depending on the parametric signal 
model under study. Thus, for the parameter estimation problem, our ability to provide generic 
bounds on dQ{9,9*) will be restricted. In this paper we focus primarily on the signal recovery 
problem and provide preliminary results for the parameter estimation problem that pertain most 
strongly to the case of isometric parameterizations. 

In this paper, we do not confront in depth the question of how a recovery program such as 
(10) can be efficiently solved. Some discussion of this matter is provided in [3], with application- 
specific examples provided in [15,36]. Unfortunately, it is difficult to propose a single general- 
purpose algorithm for solving (10) in M^, as even the problem (11) in may be difficult to solve 
depending on certain nuances (such as topology) of the individual manifold. Nonetheless, iterative 
algorithms such as Newton's method [37] have proved helpful in many problems to date. Additional 
complications arise when the manifold A4 is non-differentiable, as may happen when the signals 
X represent 2-D images. However, just as a multiscalc rcgularization can be incorporated into 
Newton's method for solving (11) (see [37]), an analogous rcgularization can be incorporated into a 
compressive measurement operator $ to facilitate Newton's method for solving (10) (see [18,36]). 
For manifolds that lack differentiability, additional care must be taken when applying results such 
as Theorem 3; we defer a study of these matters to a subsequent paper. 

In the following, we will consider both deterministic and probabilistic instance-optimal bounds 
for signal recovery and parameter estimation, and we will draw comparisons to the sparsity-based 
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(12) 



(13) 



CS results of Section 2.3. Our bounds are formulated in terms of generic properties of the manifold 
(as mentioned in Section 3.1), which will vary from signal model to signal model. In some cases, 
calculating these may be possible, whereas in other cases it may not. Nonetheless, we feel the results 
in this paper highlight the relative importance of these properties in determining the requisite 
number of measurements. Finally, to simplify analysis we will focus on random orthoprojectors for 
the measurement operator although our results may be extended to other random distributions 
such as the Gaussian [3]. 



4 A deterministic instance-optimal bound in ^2 

Wc begin by seeking a deterministic instance-optimal bound. That is, for a measurement matrix 
$ that meets (9) for all xi,X2 & Ai, we seek an upper bound for the relative reconstruction error 



that holds uniformly for all x € R^. In this section wc consider only the signal recovery problem; 
however, similar bounds would apply to parameter estimation. Wc have the following result for the 
noise-free case, which applies not only to the manifolds described in Theorem 3 but also to more 
general sets. 

Theorem 4 Let M C be any subset ofR^, and let $ denote an MxN orthoprojector satisfying 
(9) for all xi,X2 G M.. Suppose x G M^, let y = ^x, and let the recovered estimate x and the 
optimal estimate x* be as defined in (10) and (11)- Then 



\x — x\ 



2 



< 



4iV / N 

-3 + 2,^-^-1- (14) 



x-x*\\^- \M{l-ef yM(l-e)2 



Proof: See Appendix B. 

As ^ — ^ 0, the bound on the right hand side of (14) grows as -^^J^- Unfortunately, this 
is not desirable for signal recovery. Supposing, for example, that we wish to ensure ||x — rc||2 < 
Cg ||x — x*||2 for all X G M^, then using the bound (14) we would require that M > C7N regardless 
of the dimension K of the manifold. 

The weakness of this bound is a geometric necessity; indeed, the bound itself is quite tight in 
general, as the following simple example illustrates. Suppose N > 2 and let A4 denote the line 
segment in joining the points (0, 0, ... , 0) and (1, 0, 0, ... , 0). Let < 7 < 7r/2 for some 7, let 
M = 1, and let the 1 x N measurement matrix 

$ = ^/iV[cos(7); -sin(7); 0; 0; ••• ; 0]. 

Any xi G At we may write as xi = (xi(l), 0, 0, ... , 0), and it follows that $xi = Vn cos(7)xi(l). 
Thus for any pair Xi,X2 & M, we have 



|$a;i - $a;2||2 |vA/"cos(7)a;i(l) - VA/"cos(7)x2(l)| rrz 

— = ViV cos(7j. 



IX1-X2II2 |xi(l) - a;2(l)| 
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We suppose that 005(7) < ^^d thus referring to equation (9) we have (1 — e) = \fN cos (7). Now, 
we may consider the signal x = (1, tan(7r/2 — 7), 0, 0, . . . , 0). We then have that x* = (1, 0, 0, . . . , 0), 
and \\x — x*||2 = tan(7r/2 — 7). We also have that = -\/iV(cos(7) — sin(7) tan(7r/2 — 7)) = 0. 
Thus X = (0, 0, . . . , 0) and - xW^ = cos(7r/2-7) ' ^"^^ ^° 

||X-X||2 _ 1 _ 1 _ 1 _ ^ 

||a; — x*||2 cos(7r/2 — 7) tan(7r/2 — 7) sin(7r/2 — 7) cos(7) 1 — e 

It is worth recalling that, as we discussed in Section 2.3, similar difficulties arise in sparsity- 
based CS when attempting to establish a deterministic £2 instance-optimal bound. In particular, to 
ensure that ||x — x\\2 < C3 \\x — xk\\2 for all x G , it is known [13] that this requires M > C4^N 
regardless of the sparsity level K. 

In sparsity-based CS, there have been at least two types of alternative approaches. The first 
are the deterministic "mixed-norm" results of the type given in (4) and (5). These involve the use 
of an alternative norm such as the ii norm to measure the distance from the coefficient vector a 
to its best K-teim approximation ax- While it may be possible to pursue similar directions for 
manifold-modeled signals, we feel this is undesirable as a general approach because when sparsity 
is no longer part of the modeling framework, the £1 norm has less of a natural meaning. Instead, 
we prefer to seek bounds using £2 , as that is the most conventional norm used in signal processing 
to measure energy and error. 

Thus, the second type of alternative bounds in sparsity-based CS have involved £2 bounds in 
probability, as we discussed in Section 2.3. Indeed, the performance of both sparsity-based and 
manifold-based CS is often much better in practice than a deterministic £2 instance-optimal bound 
might indicate. The reason is that, for any such bounds consider the worst case signal over all 
possible X G M^. Fortunately, this worst case is not typical. As a result, it is possible to derive 
much stronger results that consider any given signal x G and establish that for most random 
the recovery error of that signal x will be small. 



5 Probabilistic instance-optimal bounds in £2 

For a given measurement operator our bound in Theorem 4 applies uniformly to any signal in 
M.^. However, a much sharper bound can be obtained by relaxing the deterministic requirement. 

5.1 Signal recovery 

Our first bound applies to the signal recovery problem, and we include the consideration of additive 
noise in the measurements. 

Theorem 5 Suppose x G M^. Let M be a compact K -dimensional Riemannian submanifold of 

M"^ having condition number 1/r, volume V, and geodesic covering regularity R. Fix < e < 1 
and < p < 1. Let ^ be a random M x N orthoprojector, chosen independently of x, with 

^^p/irlog(iVFflr-^'.-)log(l/p)^| 

Let rj G M^^, let y = ^x + r], and let the recovered estimate x and the optimal estimate x* be as 
defined in (10) and (11)- If M < N, then with probability at least 1 — p the following statement 
holds: ^ 

\\x - XII2 < (1 + 0.25e) \\x - X* II2 + (2 + 0.32e) ||r?||2 + (16) 
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Proof: See Appendix C. 



The proof of this theorem, Hke that of Theorem 3, involves the Johnson-Lindcnstrauss Lemma. 
Our proof of Theorem 5 extends the proof of Theorem 3 by adding the points x and x* to the finite 
sampling of points drawn from M. that are used to establish (9). 

Let us now compare and contrast our bound with the analogous results for sparsity-based CS. 
Like Theorem 2, wc consider the problem of signal recovery in the presence of additive measurement 
noise. Both bounds relate the recovery error ||x — ,'r||2 to the proximity of x to its nearest neighbor 
in the concise model class (either xk or x* depending on the model), and both bounds relate the 
recovery error ||a; — x\\2 to the amount ||r/||2 of additive measurement noise. However, Theorem 2 is 
a deterministic bound whereas Theorem 5 is probabilistic, and our bound (16) measures proximity 
to the concise model in the £2 norm, whereas (5) uses the ii norm. 

Our bound can also be compared with (6), as both are instance-optimal bounds in probability, 
and both use the £2 norm to measure proximity to the concise model. However, we note that unlike 
(6), our bound (16) allows the consideration of measurement noise. 

2 

Finally, we note that there is an additional term g|g^ appearing on the right hand side of (16). 
This term becomes relevant only when both ||x — x*||2 and ||??||2 are significantly smaller than the 
condition number r, since < 1 and gg^jy <C 1. Indeed, in these regimes the signal recovery remains 
accurate (much smaller than r), but the quantity ||x — x\\2 may not remain strictly proportional to 
||a; — a;*||2 and ||?7||2- The bound may also be sharpened by artificially assuming a condition number 
1/t' > 1/t for the purpose of choosing a number of measurements M in (15). This will decrease 
the last term in (16) as In the case where ry = 0, it is also possible to resort to the bound 

(14); this bound is inferior to (16) when || large but ensures that ||x — SII2 when 

\\x — X*\\2 — >■ 0. 

5.2 Parameter estimation 

Above we have derived a bound for the signal recovery problem, with an error metric that measures 
the discrepancy between the recovered signal x and the original signal x. 

However, in some applications it may be the case that the original signal x ^ xq* , where 6* E G 
is a parameter of interest. In this case we may be interested in using the compressive measurements 
y = ^x + 77 to solve the problem (12) and recover an estimate 9 of the underlying parameter. 

Of course, these two problems are closely related. However, we should emphasize that guar- 
anteeing II 2 does not automatically guarantee that d^n (xg-, x^i* ) is small (and 
therefore does not ensure that dQ{9,9*) is small). If the manifold is shaped like a horseshoe, for 
example, then it could be the case that xq* sits at the end of one arm but xg- sits at the end of the 
opposing arm. These two points would be much closer in a Euclidean metric than in a geodesic 
one. 

Consequently, in order to establish bounds relevant for parameter estimation, our concern fo- 
cuses on guaranteeing that the geodesic distance dM {x^, xq* ) is itself small. 

Theorem 6 Suppose x G M^. Let A4 be a compact K -dimensional Riemannian submanifold of 
having condition number 1/t, volume V, and geodesic covering regularity R. Fix < e < 1 
and < p < 1. Let ^ be a random M x N orthoprojector, chosen independently of x, with 
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Let rj G M^^, let y = $.t + rj, and let the recovered estimate x and the optimal estimate x* be as 
defined in (10) and (11)- If M < N and if 1.16 ||?7||2 + — x*\\2 < t/5, then with probability at 
least 1 — p the following statement holds: 

dM{x, X*) < (4 + 0.5e) \\x - x*]]^ + (4 + 0.64e) ||r?||2 + (17) 



Proof: See Appendix D. 

In several ways, this bound is similar to (16). Both bounds relate the recovery error to the 
proximity of x to its nearest neighbor x* on the manifold and to the amount II77II2 of additive 
measurement noise. Both bounds also have an additive term on the right hand side that is small 
in relation to the condition number r. 

In contrast, (17) guarantees that the recovered estimate x is near to the optimal estimate x* in 
terms of geodesic distance along the manifold. Establishing this condition required the additional 
assumption that 1.16 ||??||2 + — x* II2 < t/5. Because r relates to the degree to which the manifold 
can curve back upon itself at long geodesic distances, this assumption prevents exactly the type of 
"horseshoe" problem that was mentioned above, where it may happen that djuix, x*) 3> ||x — x* II2. 
Suppose, for example, it were to happen that ||x — x*\\2 ~ r and x was approximately equidistant 
from both ends of the horseshoe; a small distortion of distances under $ could then lead to an 
estimate x for which 11 X OC 1 1 2 1 1 37 X 

*||2 but dMix,x*) 2> 0. Similarly, additive noise could 
cause a similar problem of "crossing over" in the measurement space. Although our bound provides 
no guarantee in these situations, we stress that under these circumstances, accurate parameter 
estimation would be difficult (or perhaps even unimportant) in the original signal space M'^. 

Finally, we revisit the situation where the original signal x « xq* for some 9* & Q (with 6* 
satisfying (13)), where the measurements y = ^x + rj, and where the recovered estimate 6 satisfies 
(12). We consider the question of whether (17) can be translated into a bound on dQ{9,9*). As 
described in Section 3.1, in signal models where M. is isometric to 6, this is automatic: we have 
simply that 

dM{xg,x0.) = de{e,9*). 

Such signal models are not nonexistent. Work by Donoho and Grimes [19], for example, has 
characterized a variety of articulated image classes for which (7) holds or for which (x^)^ , x^^j ) = 
CsdQ{9i,92) for some constant Cs > 0. In other models it may hold that 

C9dM{x0j^,X02) < de{9i,92) < Cio(ix(xei, x^J 

for constants Cq,Cio > 0. Each of these relationships may be incorporated to the bound (17). 

6 Conclusions and future work 

In this paper, wc have considered the tasks of signal recovery and parameter estimation using com- 
pressive measurements of a manifold-modeled signal. Although these problems differ substantially 
from the mechanics of sparsity-based signal recovery, we have seen a number of similarities that 
arise due to the low-dimensional geometry of the each of the concise models. First, we have seen 

that a sufficient number of compressive measurements can guarantee a stable embedding of either 
type of signal family, and the requisite number of measurements scales linearly with the informa- 
tion level of the signal. Second, we have seen that deterministic instance-optimal bounds in £2 
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are necessarily weak for both problems. Third, wc have seen that probabilistic instance-optimal 
bounds in £2 can be derived that give the optimal scaling with respect to the signal proximity to 
the concise model and with respect to the amount of measurement noise. Thus, our work supports 
the growing empirical evidence that manifold-based models can be used with high accuracy in 
compressive signal processing. 

As discussed in Section 3.3, there remain several active topics of research. One matter concerns 
the problem of non-differentiable manifolds that arise from certain classes of articulated image 
models. Based on preliminary and empirical work [20,36], we believe that a combined multiscale 
regularization/measurement process is appropriate for such problems. However, a suitable theory 
should be developed to support this. A second topic of active research concerns fast algorithms for 
solving problems such as (10) and (12). Most successful approaches to date have combined initial 
coarse-scale discrete searches with iterative Newton-like refinements. Due to the problem-specific 
nuances that can arise in manifold models, it is unlikely that a single general-purpose algorithm 
analogous to ^i-minimization will emerge for solving these problems. Nonetheless, advances in 
these directions will likely be made by considering existing techniques for solving (11) and (13) 
in the native space, and perhaps by considering the multiscale measurement processes described 
above. 

Finally, while we have not considered stochastic models for the parameter 9 or the noise r/, it 
would be interesting to consider these situations as well. A starting point for such statistical analysis 
may be the constrained Cramer-Rao Bound formulations [30, 34] in which an unknown parameter 
is constrained to live along a low-dimensional manifold. However, the appropriate approach may 
once again be problem-dependent, as the nearest-neighbor estimators (12), (13) we describe can be 
biased for nonlinear or non-isometric manifolds. 
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A Geodesic covering regularity 

We briefly review the definition of geodesic covering regularity and refer the reader to [3] for a 
deeper discussion. 



Definition 2 Let M. he a compact Riemannian submanifold of M . Given T > Q, the geodesic 
covering number G{T) of M. is defined as the smallest number such that there exists a set A of 
points on M, #A = G(T), so that for all x e M, 

inmdM{x, a) < T. 

Definition 3 Let M. he a compact K -dimensional Riemannian suhmanifold ofMJ^ having volume 
V . We say that M has geodesic covering regularity R for resolutions T <Tq if 



G{T) < 

for allO<T<To 



G{T) < (18) 
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B Proof of Theorem 4 



Fix a G [1 — e, 1 + e]. We consider any two points in Wa, Wb e M. such that 

\\<^Wa - <^Wb\\^ _^ 
\\Wa-Wb\\2 

and supposing that x is closer to Wa, i.e., 

- Wa\\2 < \\x - Wb\\2 , 

but is closer to ^Wb, i.e., 

W^X — ^Wb\\2 < W^X — ^Wa\\2 , 

we seek the maximum value that 

— Wb\\2 



\X - Wa 



^a\\2 

may take. In other words, we wish to bound the worst possible "mistake" (according to our error 
criterion) between two candidate points on the manifold whose distance is scaled by the factor a. 
This can be posed in the form of an optimization problem 

\\X — Wb\\r, 

max J 7^ s.t. Ilz — i(;a|L < — iffelL , 

xeR'^ ,Wa,WbeM \\x -Wa\\2 

W^X - ^Wb\\2 < W^X - ^Wa\\2 , 
W^Wa - <^Wb\\2 



\wa - mh 



= a. 



For simplicity, we may expand the constraint set to include all Wa,Wb G the solution to this 
larger problem is an upper bound for the solution to the case where Wa,Wb G M.- 

The constraints and objective function now are invariant to adding a constant to all three 
variables or to a constant rescaling of all three. Hence, without loss of generality, we set Wa = 



and ||x||2 = 1. This leaves 



max llx — 'u;6||2 s.t. 



\\x — Wb\\2 > 1, 
\\^X-^Wb\\2 < \\^X\\2, 



\Wh\\2 



a. 



We may safely ignore the second constraint (because of its relation to the objective function), and 
we may also square the objective function (to be later undone). 

We recall that $ = yiV/M H, where H is an M x A'' matrix having orthonormal rows. We let 
S' be an [N — M) x N matrix having orthonormal rows that are orthogonal to the rows of S, and 
we define = ^/N/ME'. It follows that for any x' G M^, 

ll^x'll^ + W^'x'Wl = {N/M) . 

This leads to 

max {M/N){\\^x - i^Wb\\l + ll^'^; - ^'wb\t) 
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subject to 



|^>x||2 + W^'x 



N/M, 



\^X — ^WbWl ^ 11^^112 ' 



II^"'6||2 



The last constraint may be rewritten as 



{M/N)a^. 



N 1 



1 



We note that the <J> and components of each vector may be optimized separately (subject to 
the listed constraints) because they are orthogonal components of that vector. Define /3 to be the 
value of ll^'wbllg taken for the optimal solution wi,. We note that the constraints refer to the norm 
of the vector ^'wf, but not its direction. To maximize the objective function, then, ^'w^ must be 
parallel to ^'x but with the opposite sign. Equivalently, it must follow that 



WxW 



(19) 



We now consider the second term in the objective function. Prom (19), it follows that 

2 



\^'x — ^'wb\\„ = 



^'x[l + 



The third constraint also demands that 



Substituting into (20), we have 



\^'x — ^'wb\ 



\^'x\ 



N 1 



1 + 2, 



1 + 



- 1 



l^'xW 



(20) 



/3 



+ 



P' 



l^'xl 



\<^'x\\l + 2\\^'x\\^\\<S>Wb\\2 



Ma2 



This is an increasing function of ||$it;ft||2, and so we seek the maximum value that ||<l>'ii;b||2 ^^^^ 
subject to the constraints. Prom the second constraint we see that — ^•w&||2 < ll^3;||2; thus, 
ll^iUbllg is maximized by letting ^Wb = 2$x. With such a choice of ^Wb we then have 



\^X — ^Wb\ 



2 • 



We note that this choice of ^Wb also maximizes the first term of the objective function subject to 
the constraints. 
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We may now rewrite the optimization problem, in light of the above restrictions: 



max (M/N) \ \\^x\\l + \Wx\\l + 4 ||$x|U \Wx\L \H^—. 



„ ,,2 ( N 1 

,-l + 4||$x||^-^-l 



2 Il2 

s.t. ll^xlU + $'a; L = — . 

II 112 II 112 



We now seek to bound the maximum value that the objective function may take. We note that the 
single constraint implies that 

and that ||$x||2 < \/ N/M (but because these cannot be simultaneously met with equality, our 
bound will not be tight). It follows that 

{M/N) (^\^x\\l + + 4 ^^^-1 + 4 W^xWl (^^ - l)) 

, , , ' N N In 1 N / N 1 



Returning to the original optimization problem (for which we must now take a square root), 
this implies that 



11^-^^112 < N± _ 3 + 2a/-- - 1 
\\x - Wa\\2 ~ ^ M a'^ y M 

for any observation x that could be mistakenly paired with Wf, instead of Wa (under a projection 
that scales the distance \\wa — wi,\\2 by a). Considering the range of possible a, the worst case may 
happen when a = (1 — e). □ 

C Proof of Theorem 5 



Following the proof of Theorem 3 (see [3]), we let ei = and T = g^^- We let A be a minimal 
set of points on the manifold M. such that, for every x' e M, 

mmdMix',a) <T. (21) 

We call A the set of anchor points. From (18) we have that #A < ^^X^'^ . The proof also 
describes a finite set of points B z:) A and applies the Johnson-Lindenstrauss Lemma to this set to 
conclude that 

(1 - ei) - 62II2 < ll^^i - '^hh < (1 + £2) \\bi - 62II2 (22) 

holds for all 61, 62 G B. The cardinality of the set B dictates the requisite number of measurements 
M in (8). 

For our purposes, we define a new set B' := B U {x} U {x*}. Noting that <1> is independent of 
both X and x* , we may apply the Johnson-Lindenstrauss Lemma to B' instead and conclude that 
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(22) holds for all 61,62 G B'. This new set has cardinality \B'\ < \B\ + 2, and one may check that 
this does not change the order of the number of measurements required in (8). 

Let a denote the anchor point nearest to x in terms of £2 distance in R^. It follows that 
llx — alio < T. Since G B' , we know that 

(1 - ei) \\x - a||2 < W^x - $o||2 < (1 + ei) ||x - a||2 

and 

(1 - ei) ||x - x*||2 < ||$x - $x*||2 < (1 + ei) ||a; - a;*||2 . 
Also, since x,a G A^, we have from the conclusion of Theorem 3 that 

(1 - e) ||x - a||2 < ||$x - $a||2 < (1 + e) ||x - a||2 . 

Finally, notice that by definition 

1 1 X 1 1 1 1 2 

and 

\\{(^x + ri) - $S||2 < IK^x + r?) - $a;*||2. 

Now, combining all of these bounds and using several applications of the triangle inequality we 
have 

1 1 1 1 2 1 1 ^ 1 1 2 ^ 

\\x — a U 

< ||a;-a||2 + r 

i — ei 

< -J— (||$x-$x||2 + ||$x-$a||2)+r 
i — ei 

< (||$x - $x + 77II2 + ||r?|l2 + 11^^ - ^a|l2) + T 
i — ei 

< (||$x - + n||2 + ||r?||2 + ||$x - $a||2) + T 
i — ei 

< {\\^x - $x*||2 + 2 ||,?||2 + \\^x- $a||2) + T 
i — ei 

< 7^ ((1 + ei) ||x - x*||2 + 2 ||r?||2 + ||$x - $a||2) + T 

i — ei 

< ((1 + 11^ - + 2 II1II2 + r(i + £)) + r 
= ^ + l±£l||,_,.||, + Tri±l + i 

1 — ei 1 — ei 



One can check that 

< 1 + 0.16e, 

1 - ei 
1 + ei 
1-ei 

and 



< 1 + 0.25e, 



1±1 < 1 + 1.31e. 
1 - ei 
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Therefore, 

||x - x\\2 < (2 + 0.32e) ||?7||2 + (1 + 0.25e) ||x - x*\\2 + 



936iV' 

□ 



D Proof of Theorem 6 

Using a simple triangle inequality and (16), we have 



p-a;*||2 < ^||2 + lk-a;*||2 < (2 + 0.32e) ||r7||2 + (2 + 0.25e) ||a;-x*||2 + ggg^y- (23) 

Now, since both x and x* belong to A^, we can invoke Lemma 2.3 from [33], which states that 
if ll^— x*\\^ < r/2, then 

dMix,x*) <T- T^l-2\\x- x*\\2/t. (24) 

(This lemma guarantees that two points separated by a small Euclidean distance are also separated 
by a small geodesic distance, and so the manifold does not "curve back" upon itself.) To apply this 
lemma, it is sufficient to know that 

(2 + 0.32e) ||r?||2 + (2 + 0.25e) ||x - x*\\^ + ^ < r/2. 



i.e., that 

2 + 0.32e 



|?7||2 + \\x — X* < T 



2 936jV 



2 + 0.25e'""^ " - \2 + 0.25e/ 

For the sake of neatness, we may tighten this condition to 1.16||r7||2 + Il^~^*ll2 — '''/^i which 
implies the sufficient condition above (since e < 1). Thus, if ||a; — x*||2 and \\ri\\2 are sufficiently 
small (on the order of the condition number r), then we may combine (23) and (24), giving 



dM {x, X* 



< r - T^l - ^ (^(2 + 0.32e) ||r?||2 + (2 + 0.25e) ||x - x* II2 + 

__J._((i±^,,,,,(l±i^,,_.,,,^). 



Under the assumption that 1.16 ||r7||2 + \\x — x*\\2 < r/5, it follows that 
^ (4 + 0.64e) (4 + 0.5e) „ 

< '—^ II.II2 + II- - + ^ < 1 

and so 

/ (4 + 0.64e) , (4 + 0-56) ^\ 

This allows us to simplify (25) and gives our final result: If 1.16 ||ry||2 + ||a; — x*\\2 < t/5, then 



dM{x,x*) < (4 + 0.64e)||r/||2 + (4 + 0.5e)||x-a;*||2 



6V 

468iV' 

□ 
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